Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28669: Deadlock found when TxnStoreMutex trying to acquireLock #5585

Merged
merged 4 commits into from
Jan 8, 2025

Conversation

dengzhhu653
Copy link
Member

What changes were proposed in this pull request?

Resolving the Deadlock when the back db is MySQL

Why are the changes needed?

By default MySQL default isolation level is REPEATABLE-READ, for update in this isolation will hold the gap lock, if multiple clients are trying to for update and then insert into the same gap, it cloud cause the deadlock.

Does this PR introduce any user-facing change?

A new hive.metastore.have.multiple.leaders, if it's false, then the same housekeeping tasks will not leverage the db mutex to block each other.

Is the change a dependency upgrade?

No

How was this patch tested?

Testing the PR locally, querying from mysql information_schema.INNODB_TRX table, showing that trx_isolation_level of the new trx is 'READ COMMITTED'

@Aggarwal-Raghav
Copy link
Contributor

Based on the comments on JIRA and I have tested this patch on my local and haven't seen the error stacktrace, hence
LGTM +1 (non-binding)

@deniskuzZ
Copy link
Member

@dengzhhu653, before HIVE-27481 we always used READ_COMMITTED isolation level in TxnHandler:

@Override
  public LockHandle acquireLock(String key) throws MetaException {
    /**
     * The implementation here is a bit kludgey but done so that code exercised by unit tests
     * (which run against Derby which has no support for select for update) is as similar to
     * production code as possible.
     * In particular, with Derby we always run in a single process with a single metastore and
     * the absence of For Update is handled via a Semaphore.  The later would strictly speaking
     * make the SQL statements below unnecessary (for Derby), but then they would not be tested.
     */
    Connection dbConn = null;
    Statement stmt = null;
    ResultSet rs = null;
    boolean needToCloseConn = true;
    try {
      try {
        String sqlStmt = sqlGenerator.addForUpdateClause("SELECT \"MT_COMMENT\", \"MT_KEY2\" FROM \"AUX_TABLE\" WHERE \"MT_KEY1\"=" + quoteString(key));
        lockInternal();
        dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED, connPoolMutex);
        stmt = dbConn.createStatement();
        LOG.debug("About to execute SQL: {}", sqlStmt);
        rs = stmt.executeQuery(sqlStmt);

if that is changed to default, should we restore the READ_COMMITTED?

public TransactionContext getNewTransaction(int propagation) {
TransactionContext context = new TransactionContext(realTransactionManager.getTransaction(
new DefaultTransactionDefinition(propagation)), this);
public TransactionContext getNewTransaction(Transactional transactional) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not create a parameterless method and use Propagation.REQUIRED as default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this transactional is retrieved from the method annotation,
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java#L281
each method might have its own definition, though we only care about the isolation level and propagation now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default, the isolation is default, and we don't specify another level elsewhere.
we don't introduce the isolation() in the TxnStore, the level can be specified via:

  @Transactional(value = POOL_TX, isolation = Isolation.READ_COMMITTED, propagation = Propagation.REQUIRED, timeout = 30)
  GetOpenTxnsResponse getOpenTxns() throws MetaException;

Copy link
Member

@deniskuzZ deniskuzZ Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

  public TransactionContext getNewTransaction(int propagation, int isolation) {
    DefaultTransactionDefinition transactionDefinition = new DefaultTransactionDefinition(propagation);
    transactionDefinition.setIsolationLevel((isolation != Isolation.DEFAULT.value()) ? 
        isolation : ISOLATION_READ_COMMITTED);
    TransactionContext context = new TransactionContext(
        realTransactionManager.getTransaction(transactionDefinition), this);
    contexts.set(context);
    return context;
  }

  public TransactionContext getNewTransaction(int propagation) {
    return getNewTransaction(propagation, ISOLATION_READ_COMMITTED);
  }
....

 context = jdbcResource.getTransactionManager().getNewTransaction(
    transactional.propagation().value(), transactional.isolation().value());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@deniskuzZ deniskuzZ Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, you decided to get rid of the isolation level? that's fine, but in that case, why do we need 2nd method without the propagation arg?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, removed the 2nd method

@dengzhhu653
Copy link
Member Author

@dengzhhu653, before HIVE-27481 we always used READ_COMMITTED isolation level in TxnHandler:

@Override
  public LockHandle acquireLock(String key) throws MetaException {
    /**
     * The implementation here is a bit kludgey but done so that code exercised by unit tests
     * (which run against Derby which has no support for select for update) is as similar to
     * production code as possible.
     * In particular, with Derby we always run in a single process with a single metastore and
     * the absence of For Update is handled via a Semaphore.  The later would strictly speaking
     * make the SQL statements below unnecessary (for Derby), but then they would not be tested.
     */
    Connection dbConn = null;
    Statement stmt = null;
    ResultSet rs = null;
    boolean needToCloseConn = true;
    try {
      try {
        String sqlStmt = sqlGenerator.addForUpdateClause("SELECT \"MT_COMMENT\", \"MT_KEY2\" FROM \"AUX_TABLE\" WHERE \"MT_KEY1\"=" + quoteString(key));
        lockInternal();
        dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED, connPoolMutex);
        stmt = dbConn.createStatement();
        LOG.debug("About to execute SQL: {}", sqlStmt);
        rs = stmt.executeQuery(sqlStmt);

if that is changed to default, should we restore the READ_COMMITTED?

I have made the default as the READ_COMMITTED when calling the TxnStore's method unless it has specified the level:
https://github.com/apache/hive/pull/5585/files#diff-63125a22ca15ef76a1e6b5a52d230e9b24c816cbd15442e3ba640da2f4c167d0R113
https://github.com/apache/hive/pull/5585/files#diff-080bfb902495302473126b9827a086cf164e63d8742c3b80082469b01b405a48R66-R67

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1, pending tests

Copy link

sonarqubecloud bot commented Jan 8, 2025

@dengzhhu653 dengzhhu653 merged commit 1836187 into apache:master Jan 8, 2025
4 checks passed
@dengzhhu653 dengzhhu653 deleted the HIVE-28669 branch January 8, 2025 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants